Secure data joins using a secure join key in a multiple tenant database system

ABSTRACT

Systems, methods, and devices for generating a secure join of database data are disclosed. A method generates a secure user defined function (UDF) that includes a one-way hash. The method uses the secure UDF to convert datapoints of a first account and datapoints of a second account into a secure join key, which is unidentifiable to the first account and the second account based on the one-way hash. The method then determines a count value of overlapping datapoints between the first account and the second account based on the secure join key.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.17/900,759, filed on Aug. 31, 2022, entitled “SECURE DATA POINT MATCHINGIN A MULTIPLE TENANT DATABASE SYSTEM,” which is a continuation of U.S.application Ser. No. 17/847,681 filed on Jun. 23, 2022, now U.S. Pat.No. 11,461,493 issued on Oct. 4, 2022, entitled “DATA OVERLAP COUNTADJUSTMENT IN A MULTIPLE TENANT DATABASE SYSTEM,” which is acontinuation of U.S. application Ser. No. 17/521,348 filed on Nov. 8,2021, now U.S. Pat. No. 11,386,222 issued on Jul. 12, 2022, entitled“DATA OVERLAP COUNT IN A MULTIPLE TENANT DATABASE SYSTEM,” which is acontinuation of U.S. application Ser. No. 17/321,327 filed on May 14,2021, now U.S. Pat. No. 11,170,125 issued on Nov. 9, 2021, entitled“DATA OVERLAP COUNT IN A MULTIPLE TENANT DATABASE SYSTEM,” which is acontinuation of U.S. application Ser. No. 17/004,375 filed on Aug. 27,2020, now U.S. Pat. No. 11,048,817 issued on Jun. 29, 2021, entitled“SHARE BASED DATA OVERLAP COUNT IN A MULTIPLE TENANT DATABASE SYSTEM,”which is a continuation of U.S. application Ser. No. 16/902,266 filed onJun. 16, 2020, now U.S. Pat. No. 10,783,271 issued on Sep. 22, 2020,entitled “SECURE VIEW-BASED DATA JOINS IN A MULTIPLE TENANT DATABASESYSTEM,” which is a continuation of U.S. application Ser. No. 16/708,067filed on Dec. 9, 2019, now U.S. Pat. No. 10,713,380 issued on Jul. 14,2020, entitled “SECURE DATA JOINS IN A MULTIPLE TENANT DATABASE SYSTEM,”which is a continuation of U.S. application Ser. No. 16/368,339 filed onMar. 28, 2019, now U.S. Pat. No. 11,188,670 issued on Nov. 30, 2021,entitled “SECURE DATA JOINS IN A MULTIPLE TENANT DATABASE SYSTEM,” theentire contents of which are hereby incorporated by reference.

TECHNICAL FIELD

The present disclosure relates to databases and more particularlyrelates to secure joins of database data.

BACKGROUND

Databases are widely used for data storage and access in computingapplications. A goal of database storage is to provide enormous sums ofinformation in an organized manner so that it can be accessed, managed,and updated. In a database, data may be organized into rows, columns,and tables. Different database storage systems may be used for storingdifferent types of content, such as bibliographic, full text, numeric,and/or image content. Further, in computing, different database systemsmay be classified according to the organization approach of thedatabase. There are many different types of databases, includingrelational databases, distributed databases, cloud databases,object-oriented and others.

Databases are used by various entities and companies for storinginformation that may need to be accessed or analyzed. In an example, aretail company may store a listing of all sales transactions in adatabase. The database may include information about when a transactionoccurred, where it occurred, a total cost of the transaction, anidentifier and/or description of all items that were purchased in thetransaction, and so forth. The same retail company may also store, forexample, employee information in that same database that might includeemployee names, employee contact information, employee work history,employee pay rate, and so forth. Depending on the needs of this retailcompany, the employee information and the transactional information maybe stored in different tables of the same database. The retail companymay have a need to “query” its database when it wants to learninformation that is stored in the database. This retail company may wantto find data about, for example, the names of all employees working at acertain store, all employees working on a certain date, all transactionsfor a certain product made during a certain time frame, and so forth.

When the retail store wants to query its database to extract certainorganized information from the database, a query statement is executedagainst the database data. The query returns certain data according toone or more query predicates that indicate what information should bereturned by the query. The query extracts specific data from thedatabase and formats that data into a readable form. The query may bewritten in a language that is understood by the database, such asStructured Query Language (“SQL”), so the database systems can determinewhat data should be located and how it should be returned. The query mayrequest any pertinent information that is stored within the database. Ifthe appropriate data can be found to respond to the query, the databasehas the potential to reveal complex trends and activities. This powercan only be harnessed through the use of a successfully executed query.

In certain implementations of database technology, differentorganizations or companies may wish to securely link or join theirdatabase data. Further to the above example, the retail store may wishto link or share some of its data with outside organizations, such as aproduct vendor, a healthcare provider for its employees, a shippingcompany, and so forth. However, the retail store would want to ensurethat its data was secure and that the outside organizations could notview all of its data with unrestricted access. The retail store may alsowish to enable outside organizations to link, join, and/or analyze itsdata without permitting the outside organizations to view or export rawdata. Depending on the content of the data, it can be imperative toensure that the data is secure due to privacy concerns, contractualagreements, government agency restrictions, and so forth. For example,personally identifiable information (PII), protected health information(PHI), and other forms of fine-grained data may need to remain secureeven when such database data is shared with outside organizations.

In database systems, secure views may be used as a security mechanism torestrict access to specific information stored in the database. A secureview may be specifically designated for data privacy to limit access tosensitive data, such as PII or PHI, that should not be exposed tooutside organizations and/or all users of the database. Theimplementation of views, and how the implementation of views is handled,can potentially lead to information leakage. For example, during queryoptimization, certain filters may be pushed across the view definitioncloser to input tables and information may be leaked to a user if auser-specified filter is evaluated before secure predicates areevaluated. Secure views can ensure that the security of a regular viewcannot be circumvented by clever querying of data that is stored in theregular view.

In some instances, two or more organizations may wish to join data tomake certain determinations about data that is common between the two ormore organizations, or for one organization to enrich the data of theother. For example, two companies may wish to determine how manycustomers the two companies have in common. This may be a common inquirybetween, for example, buyers and sellers of advertising, betweenhealthcare payers and providers, and so forth. This can be a challengingquestion to answer without one party exposing its entire customer listto the other party. The customer list may include sensitive informationthat should not be shared with the other party and/or it may includeinformation that the sharing party does not wish to expose forcontractual or business reasons. The two parties may wish to securelyjoin data so the parties may determine common data between the twoparties or make other beneficial determinations, or to enrich eachother's data, without exposing all underlying data. Disclosed herein aremethods, systems, and devices for securely joining database data.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive implementations of the presentdisclosure are described with reference to the following figures,wherein like reference numerals refer to like or similar partsthroughout the various views unless otherwise specified. Advantages ofthe present disclosure will become better understood with regard to thefollowing description and accompanying drawings where:

FIG. 1 is a schematic block diagram illustrating accounts in amulti-tenant database, according to one embodiment;

FIG. 2 is a schematic diagram illustrating a system for providing andaccessing database services, according to one embodiment;

FIG. 3 is a schematic diagram illustrating a multi-tenant database withseparation of storage and computing resources, according to oneembodiment;

FIG. 4 is a schematic block diagram illustrating object hierarchies,according to one embodiment;

FIG. 5 is a schematic diagram illustrating role-based access, accordingto one embodiment;

FIG. 6 is a schematic diagram illustrating a usage grant between roles,according to one embodiment;

FIG. 7 is a schematic diagram illustrating a share object, according toone embodiment;

FIG. 8 is a schematic diagram illustrating cross-account grants,according to one embodiment;

FIG. 9 is a schematic block diagram illustrating components of a sharecomponent, according to one embodiment;

FIG. 10 is a schematic diagram of a system for generating a secure joinacross database accounts, according to one embodiment;

FIG. 11 is a schematic diagram of a system for generating a secure joinacross database accounts, according to one embodiment;

FIG. 12 is a schematic block diagram of work distribution for generatinga secure join across database accounts, according to one embodiment;

FIG. 13 is a schematic block diagram of work distribution for generatinga secure join across database accounts, according to one embodiment;

FIG. 14 is a schematic block diagram illustrating a method forgenerating a secure join across database accounts, according to oneembodiment;

FIG. 15 is a schematic block diagram illustrating a method forgenerating a secure join across database accounts, according to oneembodiment; and

FIG. 16 is a block diagram depicting an example computing device orsystem consistent with one or more embodiments disclosed herein.

DETAILED DESCRIPTION

In certain instances, two or more organizations or companies may wish toshare data, join data, or enrich data. In an example of joining data,two companies may wish to determine how many customers the two companieshave in common. This can be a difficult inquiry to answer without one orboth parties exposing its entire customer list to the other party. Thecustomer list might include sensitive information such as personallyidentifiable information or protected health information, or thecustomer list itself might constitute valuable intellectual propertythat should not be shared with other parties. In such an instance, itmay be valuable to securely compare data stored in separate data storesthat are associated with the two companies, without allowing eithercompany to view certain information about the data itself or how thedata is stored.

Disclosed herein are systems, methods, and devices for joining databasedata between two parties in a secure manner that does not exposesensitive information included in the data itself or sensitiveinformation about structural or organization details about how the datais stored. Such methods and systems for joining database may herein bereferred to as a secure join. In an embodiment, a method includesdetermining data stored in a first account to be compared with datastored in a second account, wherein the data stored in the first accountand the data stored in the second account are directed to a same subjectmatter. In an exemplary embodiment, the first account is associated witha first company and the second account is associated with a secondcompany. The first company and the second company wish to comparecustomer lists to determine which customers, and/or how many customers,the two companies have in common. The method includes defining auser-defined function including procedural logic for determining anoverlap count between the data stored in the first account and the datastored in the second. The user-defined function further includes as aninput a secure join key. The secure join key is a hash that may includea salted value from each account, wherein the one or more salted valuesare salted before being hashed. The user-defined function may be definedby either of the first account or the second account and may be sharedwith the other account such that it may be run by a compute nodeassociated with either of the first account or the second account. Theuser-defined function, along with the secure join key, is configured tocompare data stored by the first account against data stored by thesecond account to determine which datapoints, and/or how manydatapoints, the two accounts have in common.

In an example, a first account (referred to in this example as theProvider) will share a secure user-defined function (UDF) that permits asecond account (referred to in this example as the Consumer) to check ifa specific datapoint exists in the Provider's database. For purposes ofthis example, the specific datapoint will refer to a customeridentification number that should not otherwise be shared between theaccounts. The secure UDF will return a one if there is a match betweenthe Consumer's list of customer identification numbers and theProvider's list of customer identification numbers. The secure UDF willreturn a zero if there is not a match. The secure UDF may be used aspart of a SQL statement to compare all datapoints between the Provider'scustomer identification numbers and the Consumer's customeridentification numbers to determine all matches. The Consumer connectsto the Provider's shared secure UDF The secure UDF is used to countcommon customers using the Provider's secure direct lookup function.Because the secure UDF can securely access the Provider's data withoutexposing it to the Consumer, and because the Provider cannot see whichdatapoints the Consumer looks up, the process is secure. However, thereis a risk that the Consumer can know with certainty if any particularcustomer for which the Consumer is aware of is a customer of theProvider. This property may not be desirable in some implementations asit allows the Consumer to probe the Provider's customer base for anyparticular known customer.

Further to the above example, a different implementation is providedthat may be referred to herein as a secure join. The Provider defines asecure user-defined function (UDF) to generate a secure join key. In anembodiment, the secure join key is a hash that includes salted data fromthe Provider and salted data from the Consumer. The Provider's data maybe salted and the Consumer's data may be salted before either data ishashed to generate the secure join key. In an embodiment, the securejoin key may further include a customer account number associated withthe Provider and/or the Consumer. The customer account number may beassociated with a cloud-based database system that generates the securejoin key. The secure UDF is shared with the Consumer along with anotherStructure Query Language (SQL) secure UDF to perform a count ofoverlapping customer datapoints between the Provider and the Consumer.The secure UDF uses a one-way hash to convert the Consumer's customerdatapoints into a meaningless string. The secure UDF may run inside theConsumer's compute nodes. The secure UDF may run on compute nodes ofeither company such that the other company does not attain visibilityinto when or whether the secure UDF was run. In an example, the Providerincludes an account identification associated with the Consumer as partof the one-way hash. This ensures that hashes cannot be used to measuredata overlap with any other account or provider. The Consumer may usethis hash approach for row-by-row or bulk matching to compare thecustomer datapoints stored in the Provider's data store versus thecustomer datapoints stored in the Consumer's data store.

Further to the above example, a different implementation is providedthat may be referred to herein as a reverse share secure join. In thisimplementation, the Consumer may create a secure view of its owncustomer datapoints and run the secure view through a hash function suchthat the customer datapoints are encoded and “salted.” When a datapointis salted, additional information (i.e. the salt) is added to thedatapoint. The salt is only known by the owner of that datapoint. In anexample, the datapoint is customer's first name, for example thedatapoint is “John.” The salted datapoint will have a salt adhered to itthat is only known by the owner of the datapoint. In the example, thesalted datapoint may read “JohnSALT123.” It should be appreciated thatthe salt may include any string of characters that is known only to theowner of the datapoint. The Consumer may share only the salted hashvalues with the Provider in a reverse share. In this example, theConsumer may include additional salted hash values that do not representany real customer datapoints. This may prevent the Provider fromdeducing any information about the true size of the Consumer's customerbase by inspecting the contents of the reverse share. This technique maybe referred to as “noise injection” and does not harm the analysis ofthe common customer datapoints existing between the Provider and theConsumer. After the Consumer shares the salted hashed customerdatapoints with the Provider, the Consumer may call a secure SQL UDF toperform a secure join. The Consumer provides the hash value and the saltvalue to the secure SQL UDF such that the secure SQL UDF may process thesalted hashed customer datapoints. The secure SQL UDF may then hash andsalt the Provider's customer datapoints to generate the secure join keyand return the distinct count of joint customer datapoints between theProvider and the Consumer.

Disclosed herein are systems, methods, and devices for creating a securejoin of database data. A first party and a second party may wish todetermine common data between the two parties. The first party and thesecond party may wish to determine all common datapoints of a certainsubject matter, or of a certain column identifier, between a data storeassociated with the first account and a data store associated with thesecond account. The first party may generate a secure UDF and share itwith the second party. The secure UDF may include, in an exampleembodiment, a SQL script that is configured to count overlapping databetween the two parties. The first party may define the secure UDF touse a one-way hash to convert the second party's sensitive data into ameaningless string to generate a secure join key. This secure UDF mayrun inside the second party's compute nodes such that the first partydoes not know that the secure UDF was run or what data was passedthrough the secure UDF. The first party may further include an accountidentification as part of the one-way hash that is associated with thesecond party account. When the second party's account identification isincluded as part of the one-way hash, the secure UDF cannot be used tomeasure data overlap with anyone other than the two parties, and in someimplementations may not be used to measure data overlap with any accountoutside the same database of the first account. The first party maygenerate and share the secure UDF according to the following example SQLscript:

create or replace secure functionget_secure_join_key(customer_key_string string) returns string as $$sha2(customer_key_string ∥ sha2(current_account( ) ∥ ‘provider secretsalt’)) $$; create or replace secure functionsecure_lookup(secure_join_key string, consumer_salt string) returnsnumeric(10) as $$ select count(*) as count_custs from provider_customersc where secure_join_key = sha2(nvl(C_EMAIL_ADDRESS,to_char(random( ))) ∥consumer_salt ∥ sha2(current_account( ) ‘provider secret salt’)) $$;grant usage on function get_secure_join_key(string) to shareSJ_DEMO_SHARE; grant usage on function secure_lookup(string, string) toshare SJ_DEMO_SHARE;

Further to this example, the second party may use the hash approach todetermine matching between the two parties one row at a time, or toperform bulk matching. Either method may give the same security benefitbecause the first party does not have visibility into how or when thefirst party's secure UDF is called by the second party. Additionally,the first party cannot see what data was run through the secure UDF bythe second party. This provides enhanced security benefits over the useof a secure UDF when attempting to discern matching data between twoaccounts. The following example SQL script illustrates how the secondparty may set up a share from the first party and perform a count ofcommon datapoints between the second party and the first party.

Consumer Account

create customer key string function create or replace functionget_customer_key_string(c_email_address string, consumer_salt string)returns string as $$ nvl(C_EMAIL_ADDRESS,to_char(random( )))∥consumer_salt $$; --make table of secure join keys (this is needed insome cases UDF's can not call other UDFs within them). create or replacetemporary table joinkeys as selectsj_test.public.get_secure_join_key(get_customer_key_string(C_EMAIL_ADDRESS,‘consumersalt’)) secure_join_key from consumer_customers c; --runsecure join overlap count selectsum(sj_test.public.secure_lookup(secure_join_key,‘consumersalt’))overlapping_customers, count(*) total_customers, ((overlapping_customers/ total_customers ) * 100 ) overlap_pct from joinkeys; Rowoverlapping_customers total_customers overlap_pct 1 570761 292653519.503000

In an embodiment, the second party may provide a “reverse share” toprovide additional security. In this embodiment, the second party maycreate a secure view of its data and run the data in the secure viewthrough a hash function. The salted hash values that have been runthrough the hash function may be shared with the first party in thereverse share. The second party may call an SQL UDF to perform a securejoin and provide a salt value for the hashed data points. This SQL UDFmay return a distinct count of joint data points between the secondparty and the first party. In an embodiment, the SQL UDF runs on thesecond party's compute resources such that the first party does not haveany visibility into whether the query was run or into the results of thequery. Likewise, the second party is only able to see how manyoverlapping data points exist between the second party and the firstparty. The second party is unable to see any information about thecontent or the size of the first party's non-overlapping data points.

Further in a reverse share embodiment, the second party has the optionto include additional hash values that do not represent any actual datapoints in the database. The additional hash values may prevent the firstparty from deducing anything about the size of the second party's actualdata by inspecting the contents of the reverse share. This technique maybe referred to as a “noise injection” and does not harm analysis of theshared data.

One potential risk the first party faces is the second partyintentionally probing the first party with successive requests and onlyaltering one row with each request. The second party may do this todetermine whether a specific data point is present in the first party'sdataset. In an implementation where this is a concern, the first partymay include additional logic in the counting UDF to protect against thisraw-row probing by the second party. The first party may instruct thecounting UDF to perform a distinct row reduction on the reverse shareinput set. The first party may instruct the counting UDF to enforce aminimum number of distinct rows on the reverse share inputs. The firstparty may instruct the counting UDF to add a random numeric value thatrandomly varies the returned result by, for example, plus or minus ten.

The first party may instruct the counting UDF to round off the actualdistinct overlapping result to a certain integer, for example to thenearest ten by using SQL mathematical functions. The first party mayinstruct the counting UDF to enforce a minimum number of overlap “hits”for the UDF to return a non-zero result through the use of a SQL HAVINGclause. One or more of these instructions may provide Provider-sidenoise to the result such that raw row probing by the second party ismore difficult.

A user-defined function (UDF) is a function that may be provided by auser of a program, database, or other environment. As discussed herein,a secure UDF refers to a UDF having a secure definition. In a relationaldatabase management system, a UDF may provide a mechanism for extendingfunctionality of the database server by adding a function that may beevaluated in, for example, structured query language (SQL) statements.The SQL standard may distinguish between scalar UDFs and table UDFs. Ascalar UDF may return only a single value whereas a table UDF may returna relational table comprising zero or more rows where each row may haveone or more columns. Once created, a UDF may be used in expressions inSQL statements.

Each user-defined function includes properties and characteristics. Ascalar function returns a single data value (i.e. not a table) with areturn clause. In an embodiment, scalar functions may utilize all scalardata types with the exception of user-defined data types. A table-valuedfunction may return the result of a single select statement and amulti-statement table-valued function may return a table.

In embodiments disclosed herein, where a user-defined function is usedfor securely analyzing data, the user-defined function may have a secureuser-defined function definition. The secure UDF may be created by afirst account and shared with one or more secondary accounts. The firstaccount may grant usage privileges to the one or more secondary accountssuch that the secondary accounts may, for example, utilize thefunctionality of the UDF without viewing any procedural logic making upthe UDF.

With respect to secure views, some optimizations for views requireaccess to underlying data in the base tables for the view. Access to theunderlying data may permit data that is hidden from users of the view tobe exposed by, for example, UDFs or other programmatic methods. Secureviews as disclosed in the present application do not utilize suchoptimizations and ensure that users have no access to the underlyingdata. According to one embodiment, the query expression to create astandard view (may also be referred to as the view definition or text)may be visible to users. In such an embodiment, a user may not wish toexpose the underlying tables or internal structural details for a viewdue to security or privacy reasons. With a secure view as disclosedherein, the view definition and details are only visible to authorizedusers i.e. users that have been granted the role that owns the view.

A view may be defined as a secure view when it is specificallydesignated for data privacy or to limit access to data that should notbe exposed to all accounts of the underlying table. Data might beexposed in a non-secure view when, for example, an account has access toonly a subset of data. An example scenario where information may beleaked includes the following. A database table includes employee datafor an entire company and a view is defined to expose employee data fora single department. An account, for example belonging to a head of thesingle department, is granted access to the view but not the underlyingtable. In such an embodiment, a user may gain insight into data fromother departments by querying on the view. If the filter in the query isevaluated before the join between the “employee” field and the“department” field, the user may deduce whether anyone in the entirecompany satisfies a certain metric (e.g. has a salary over x amount,etc.). Such a technique may be utilized to query any data within thebase table. A user-defined function that carries a state acrossinvocations can be utilized in a similar way to leak the results of evenmore sophisticated queries on the base table data.

Secure views permit database client accounts to expose a restricted dataset to other users without the possibility of the underlying,unrestricted data being exposed to those other users. Similar to views,user-defined functions (UDFs) permit a client account to exposerestricted data to other users, with additional capabilities due toargument processing and procedural language support. Secure UDFs mayprovide the same security guarantees as secure views.

In an embodiment, a secure view provides several security guaranteeswhen compared against a regular view. In an embodiment, the secure viewdoes not expose the view definition to non-owners of the view. Thisimpacts various operations that access the data dictionary. In anembodiment, the secure view does not expose information about anyunderlying data of the view, including the amount of data processed bythe view, the tables accessed by the view, and so forth. This impactsthe statistics that are displayed about the number of bytes andpartitions scanned in a query, and what is displayed in the queryprofile for a query referring to a secure view. In an embodiment, thesecure view does not expose data from tables accessed by the view whichis filtered out by the view. In such an embodiment, a client accountassociated with a non-secure view may access data that would be filteredout by taking advantage of query optimizations that may cause userexpressions to be evaluated before security expressions (e.g. filtersand joints). In such an embodiment, to achieve this requirement, the setof query optimizations that can be applied to a query containing asecure view may be restricted to guarantee that the user expressionsthat can leak data are not evaluated before the view is filtered.

Multi-tenant databases or multi-tenant data warehouse support multipledistinct customer accounts at once. As an example, FIG. 1 is a schematicblock diagram illustrating a multi-tenant database or data warehousethat supports many different customer accounts A1, A2, A3, An, etc.Customer accounts may be separated by multiple security controls,including different uniform resource locators (URLs) to connect to,different access credentials, different data storage locations (such asAmazon Web Services S3 buckets), and different account-level encryptionkeys. Thus, each customer may only be allowed to see, read, and/or writethe customer's own data. By design it may be impossible for a customerto see, read, or write another customer's data. In some cases, strictseparation of customer accounts is the backbone of a multi-tenant datawarehouses or database system.

In some cases, it may be desirable to allow cross-account data sharing.However, no current multi-tenant database system allows sharing of databetween different customer accounts in an instantaneous, zero-copy,easy-controllable fashion.

Based on the foregoing, disclosed herein are systems, methods, anddevices that, in one embodiment, may be implemented for generating asecure join in a multi-tenant online database system. Some embodimentsallow the implementation of secure joins, secure UDFs, and/or secureviews between different customer accounts and may make the datainstantly accessible with no need to copy data. Some embodiments provideaccess to data using fine-grained controls to maintain separation ofdesired data while allowing access to data that a customer wishes toshare.

Embodiments disclosed herein provide systems, methods, and devices forsharing a “share object” or “database object” between a sharer accountand one or more other accounts in a database system. The share object ordatabase object in one implementation may include procedural logic thatis defined by a user of a client account (in one implementation, by auser of the sharer account). The share object may be supported in scalarand table-valued user-defined functions (UDFs) and may be defined by anysuitable language. In an implementation, the procedural logic of theshare object may be used by one or more other accounts withoutpermitting the one or more other accounts to view the underlying codedefining the procedural logic. The share object or database object mayfurther include database data such as data stored in a table of thedatabase. The share object or database object may include metadata aboutdatabase data such as minimum/maximum values for a table ormicro-partition of a database, underlying structural or architecturaldetails of the database data, and so forth. The share object may includea listing of all other accounts that may receive cross-account accessrights to elements of the share object. The listing may indicate, forexample, that a second account may use procedural logic of the shareobject without seeing any underlying code defining the procedural logic.The listing may further indicate, for example, that a third account mayuse database data of one or more tables without seeing any structuralinformation or metadata about the database data. The listing mayindicate any combination of usage privileges for elements of the shareobject, including whether secondary accounts may see metadata orstructural information for database data or procedural logic.

The embodiments disclosed herein may be supported in scalar andtable-valued user-defined functions (UDFs). Various languages may beutilized including SQL and Javascript for each of the scalar andtable-valued UDFs.

In an embodiment, a scalar SQL UDF includes a SQL expression thatincluding parameters that may be replaced with arguments appearing inthe function invocation. The expression may include a query expression(i.e. a subquery), and in an embodiment, the query expression must beguaranteed to return a single row and a single column. At compile-time,a function invocation is replaced with the SQL expression and parametersin the expression are replaced with the invocation's arguments. Theresult may be a simple SQL expression or a subquery (may be correlatedor non-correlated). In an embodiment, because a scalar SQL UDF mayinclude arbitrary SQL expressions, the secure UDF may be configured to(1) not expose the view definition to non-owners of the view and (2) notexpose information about any underlying data of the view.

In an embodiment, a table-valued SQL UDF is similar to a view whereinthe query may include parameters. As a result, table-valued UDFs mayappear as correlated tables and may be decorrelated during querycompilation. In an embodiment, because a table-valued SQL UDF caninclude arbitrary SQL expressions, the secure UDFs may be configured to(1) not expose the view definition to non-owners of the view, (2) notexpose information about any underlying data of the view, and (3) notexpose data from tables accessed by the view which is filtered out bythe view.

In an embodiment, a scalar Javascript UDF is converted to a built-infunction that evaluates a piece of Javascript code for reach row that isprocessed and passes arguments from those rows into the function. In anembodiment, because a Javascript UDF cannot access database tables, theUDF may be configured to (1) not expose information about any underlyingdata of the view and (2) not expose data from tables accessed by theview which is filtered out by the view.

In an embodiment, a table-valued Javascript UDF is processed such thatrows are grouped together and the Javascript code is evaluated per groupand row processed by passing expressions from rows into the function. Inan embodiment, because a Javascript UDF cannot access database tables,the UDF may be configured to (1) not expose information about anyunderlying data of the view and (2) not expose data from tables accessedby the view that are filtered out by the view.

Embodiments disclosed herein provide technology which may be used fordetermining common datapoints between different customer accounts in anonline database and/or online data warehouse. In one embodiment, asecure join to determine common datapoints is enabled usingcross-account role grants that allow users in one customer account toaccess data in another customer account in a controlled andinstantaneous fashion by way of a secure view and/or a secure UDF,without the need to copy data. For example, the shared data may remainwithin the sharing account while allowing a foreign account to run asecure UDF on the shared data.

A detailed description of systems and methods consistent withembodiments of the present disclosure is provided below. While severalembodiments are described, it should be understood that this disclosureis not limited to any one embodiment, but instead encompasses numerousalternatives, modifications, and equivalents. In addition, whilenumerous specific details are set forth in the following description toprovide a thorough understanding of the embodiments disclosed herein,some embodiments may be practiced without some or all these details.Moreover, for the purpose of clarity, certain technical material that isknown in the related art has not been described in detail to avoidunnecessarily obscuring the disclosure.

Referring now to the figures, FIG. 1 is a schematic block diagramillustrating a multi-tenant database or data warehouse that supportsmany different customer accounts A1, A2, A3, An, etc. Customer accountsmay be separated by multiple security controls, including differentuniform resource locators (URLs) to connect to, different accesscredentials, different data storage locations (such as Amazon WebServices S3 buckets), and different account-level encryption keys. Thus,each customer may only be allowed to see, read, and/or write thecustomer's own data. By design it may be impossible for a customer tosee, read, or write another customer's data. In some cases, strictseparation of customer accounts is the backbone of a multi-tenant datawarehouses or database system.

FIG. 2 is a schematic diagram of a system 200 for providing andaccessing database data or services. The system 200 includes a databasesystem 202, one or more servers 204, and a client computing system 206.The database system 202, the one or more servers 204, and/or the clientcomputing system 206 may communicate with each other over a network 208,such as the Internet. For example, the one or more servers 204, and/orthe client computing system 206 may access the database system 202 overthe network 208 to query a database and/or receive data from a database.The data from the database may be used by the one or more servers 204 orclient computing system 206 for any type of computing application. Inone embodiment, the database system 202 is a multi-tenant databasesystem hosting data for a plurality of different accounts.

The database system 202 includes a share component 210 and storage 212.The storage 212 may include storage media for storing data. For example,the storage 212 may include one or more storage devices for storingdatabase tables, schemas, encryption keys, data files, or any otherdata. The share component 210 may include hardware and/or software forenabling the cross-account sharing of data or services and/or forassociating view privileges with data or services. For example, theshare component 210 may define a secure UDF and/or a secure view suchthat two or more accounts may determine common datapoints by way of thesecure UDF without revealing the datapoints themselves or any otherdatapoints that are not common between the accounts. Further forexample, the share component 210 may process queries/instructionsreceived from remote devices to access shared data or share data. Thequeries/instructions may be received from the one or more servers 204 orclient computing system 206. In one embodiment, the share component 210is configured to allow sharing data between accounts without creatingduplicate copies of tables, data, or the like outside the sharingaccount. For example, the share component may allow for computerresources allocated to a sharing account to perform any queries orinstructions provided by a foreign account.

In one embodiment, the share component 210 defines a user-definedfunction that includes procedural logic for determining an overlap countbetween data stored in a first account and data stored in a secondaccount. The data stored in the first account and the data stored in thesecond account may be identified by, for example, a resource manager orglobal services associated with each of the two accounts. The sharecomponent 210 may share the user-defined function with the secondaccount such that the second account may run the user-defined functionon one or more compute nodes to determine common datapoints between thefirst account and the second account.

In one embodiment, storage and compute resources for a database system100 may be logically and/or physically separated. In one embodiment,storage is a common, shared resource across all accounts. Computeresources may be set up independently, per account, as virtualwarehouses. In one embodiment, a virtual warehouse is a set of computenodes that access data in a storage layer and compute a query result.Separating the compute nodes or resources from the storage allowsscaling each layer independently. Separation of storage and compute alsoallows that shared data can be processed independently by differentaccounts, without the computation in one account affecting thecomputation in other accounts. That is, in at least some embodiments,there is no contention among computing resources when running queries onshared data.

FIG. 3 is a schematic block diagram of a multi-tenant database 300illustrating separation of storage and computing resources. For example,the multi-tenant database 300 may be a data warehouse where a pluralityof different accounts (A1, A2, A3, through An) are hosted. In FIG. 3 ,account A1 has three virtual warehouses running, account A2 has onevirtual warehouse running, and account A3 has no virtual warehouserunning. In one embodiment, all these virtual warehouses have access tothe storage layer that is separated from the compute nodes of thevirtual warehouses. In one embodiment, virtual warehouses can bedynamically provisioned or removed based on a current workload for anaccount.

In one embodiment, a multi-tenant database system 300 uses objecthierarchies in accounts. For example, each customer account may containobject hierarchies. Object hierarchies are often rooted in databases.For example, databases may contain schemas and schemas, in turn, maycontain objects such as tables, views, sequences, file formats, andfunctions. Each of these objects serves a special purpose: tables storerelational or semi-structured data; views define logical abstractionsover the stored data; sequences provide means to generateever-increasing numbers; file formats define ways to parse ingested datafiles; and functions hold user-defined execution procedures. Inembodiments as disclosed herein, views may be associated with secureuser-defined function definitions such that underlying data associatedwith the view is hidden from non-owner accounts who have access to theview.

FIG. 4 is a schematic block diagram illustrating object hierarchieswithin customer accounts. Specifically, accounts may include hierarchiesof objects which may be referenced in a database. For example, customeraccount A1 contains two databases objects D1 and D2. Database object D1contains schema object S1, which in turn contains table object T1 andview object V1. Database object D2 contains schema object S2, whichcontains function object F2, sequence object Q2, and table object T2.Customer account A2 contains a database object D3 with schema object S3and table object T3. The object hierarchies may control how objects,data, functions, or other information or services of an account ordatabase system are accessed or referenced.

In one embodiment, a database system implements role-based accesscontrol to govern access to objects in customer accounts. In general,role-based access control consists of two basic principles: roles andgrants. In one embodiment, roles are special objects in a customeraccount that are assigned to users. Grants between roles and databaseobjects define what privileges a role has on these objects. For example,a role that has a usage grant on a database can “see” this database whenexecuting the command “show databases”; a role that has a select granton a table can read from this table but not write to the table. The rolewould need to have a modify grant on the table to be able to write toit.

FIG. 5 is a schematic block diagram illustrating role-based access toobjects in customer accounts. A customer account A1 contains role R1,which has grants to all objects in the object hierarchy. Assuming thesegrants are usage grants between R1 and D1, D2, S1, S2 and select grantsbetween R1 and T1, V1, F2, Q2, T2, a user with activated role R1 can seeall objects and read data from all tables, views, and sequences and canexecute function F2 within account A1. Customer account A2 contains roleR3, which has grants to all objects in the object hierarchy. Assumingthese grants are usage grants between R3 and D3, S3, and select a grantbetween R3 and T3, a user with activated role R3 can see all objects andread data from all tables, views, and sequences within account A2.

FIG. 6 illustrates a usage grant between roles. With role-based accesscontrol, it is also possible to grant usage from one role to anotherrole. A role that has a usage grant to another role “inherits” allaccess privileges of the other role. For example, in role R2 has a usagegrant on role R1. A user (e.g., with corresponding authorizationdetails) with activated role R2 can see and read from all objectsbecause role R2 inherits all grants from role R1.

FIG. 7 is a schematic block diagram illustrating a share object SH1. Inan embodiment, the share object is a column of data across one or moretables and cross-account access rights are granted to the share objectsuch that one or more other accounts may determine common datapointsbetween data stored in the share object and data stored in the one ormore other accounts. Customer account A1 contains share object SH1.Share object SH1 has a unique name “SH1” in customer account A1. Shareobject SH1 contains role R4 with grants to database D2, schema S2, andtable T2. The grants on database D2 and schema S2 may be usage grantsand the grant on table T2 may be a select grant. In this case, table T2in schema S2 in database D2 would be shared read-only. Share object SH1contains a list of references to other customer accounts, includingaccount A2.

After a share object is created, the share object may be imported orreferenced by a target account listed in the share object. For example,importing a share object from a sharer account is possible from othercustomer accounts. A target account may run a command to list allavailable share objects for importing. Only if a share object wascreated with references that included the target account, the targetaccount may list the share object and subsequently import it. In oneembodiment, references to a share object in another account are alwaysqualified by account name. For example, target account A2 wouldreference share SH1 in sharer account A1 with the example qualified name“A1.SH1”.

In one embodiment, a process or importing a share object may include:creating an alias object in the target account; linking the alias objectwith the top-most shared object in the sharer account in the objecthierarchy; granting a role in the target account usage privileges to thealias object; and granting the target account role usage privileges tothe role contained in the share object.

In one embodiment, a target account that imports the share object ordata creates an alias object. An alias object is similar to a normalobject in a customer account. An alias object has its own unique namewith which it is identified. An alias object may be linked to thetop-most object in each object hierarchy that is shared. If multipleobject hierarchies are shared, multiple alias objects may be created inthe target account. Whenever an alias object is used (e.g., reading fromthe alias object, writing to the alias object), the alias object isinternally replaced by the normal object in the sharer account to whichit links. This way, alias objects are merely proxy objects of normalobjects, and not duplicate objects. Thus, when reading from or writingto an alias object, the operations affect the original object that thealias links to. Like normal objects, when an alias object is created itis granted to the activated role of the user.

In addition to the alias object, a grant between a role in the targetaccount and the role contained in the share object is created. This is ausage grant from role to role across customer accounts. Role-basedaccess control now allows a user in the target account to access objectsin the sharer account.

FIG. 8 is a schematic block diagram illustrating logical grants andlinks between different accounts. A database alias object D5 is createdin account A2. Database alias D5 references database D2 via link L1.Role R3 has a usage grant G1 on database D5. Role R3 has a second usagegrant G2 to role R4 in customer account A1. Grant G2 is a cross-accountgrant between accounts A1 and A2. In one embodiment, role-based accesscontrol allows a user in account A2 with activated role R3 to accessdata in account A1. For example, if a user in account A2 wants to readdata in table T2, role-based access control allows that because role R3has a usage grant of role R4 and role R4, in turn, has a select grant ontable T2. By way of illustration, a user with activated role R3 mayaccess T2 by running a query or selection directed to “D5.S2.T2”.

Using object aliases and cross-account grants from a role in the targetaccount to a role in the sharer account allows users in the targetaccount to access information in the sharer account. In this way, adatabase system may enable sharing of data between different customeraccounts in an instantaneous, zero-copy, easy-controllable fashion. Thesharing can be instantaneous because alias objects and cross-accountgrants can be created in milliseconds. The sharing can be zero-copybecause no data must be duplicated in the process. For example, allqueries, or selections can be made directly to the shared object in thesharer account without creating a duplicate in the target account. Thesharing is also easy to control because it utilizes easy-to-usetechniques of role-based access control. Additionally, in embodimentswith separated storage and compute, there is no contention amongcomputing resources when executing queries on shared data. Thus,different virtual warehouses in different customer accounts mayindividually process shared data. For example, a first virtual warehousefor a first account may process a database query or statement using datashared by a sharer account and a second virtual warehouse for a secondaccount, or the sharer account, may process a database query orstatement using the shared data of the sharer account.

FIG. 9 is a schematic block diagram of a share component 210. The sharecomponent 210 includes a cross-account rights component 902, an aliascomponent 904, a request component 906, an access component 908, aprocessing component 910, and a secure view component 912. Thecomponents 902-912 are given by way of example only and may not all beincluded in all embodiments. For example, each of the components 902-912may be included in or may be implemented as part of a separate device orsystem.

The cross-account rights component 902 is configured to create andmanage rights or grants between accounts. The cross-account rightscomponent 902 may generate a share object in a sharer account. Forexample, a user of the sharer account may provide input indicating thatone or more resources should be shared with another account. In oneembodiment, the user may select an option to create a new share objectso that resources can be shared with foreign accounts. In response tothe user input, the cross-account rights component 902 may create ashare object in the sharer account. The share object may include a roleto which access rights can be granted to resources for sharing with aforeign account. The foreign account may include a customer account orother account that is separate from the sharer account. For example, theforeign account may be another account hosted on a multi-tenant databasesystem.

Upon creation, the share object may be granted rights to one or moreresources within the sharer account. The resources may include adatabase, a schema, a table, a sequence, or a function of the shareraccount. For example, the share object may contain a role (i.e., sharerole) which is granted right to read, select, query, or modify a datastorage object, such as a database. The share object, or share role inthe share object, may be granted rights similar to how rights may begranted to other roles using role-based access control. A user may beable to access an account and grant rights to the share role so that theshare role can access resources that are meant to be shared with foreignaccounts. In one embodiment, the share object may include a list ofobjects, and an access level, for which the share role has rights.

The share object may also be made available or linked to specificforeign accounts. For example, the share object may store a list ofaccounts that have rights to the share role or share object in thesharer account. A user with the sharer account may add or removeaccounts to the list of accounts. For example, the user may be able tomodify the list to control which accounts can access objects shared viathe share object. Foreign accounts listed or identified in the shareobject may be given access to resources with access rights granted to ashare role of the share object. In one embodiment, a specific accountcan perform a search to identify share objects or sharer accounts thathave been shared with the specific account. A list of available shareobjects can be viewed by a user of the target or specific account.

The alias component 904 is configured to generate an alias for data or adata object shared by a separate account. For example, the alias objectmay create, in a target account, an alias object corresponding to ashared resource shared by a sharer account. In one embodiment, the aliasobject is created in response to a target account accepting a sharedresource or trying to access a shared resource for the first time. Thealias object may act as an alias for a data object for the highestobject hierarchy shared by the sharer account (see, e.g., FIG. 8 whereD5 is an alias for D2). The alias component 904 may also generate a linkbetween the alias object and a shared object (see, e.g., FIG. 8 where L1is the link between D5 and D2). The link may be created and/or stored inthe form of an identifier or name of the original or “real” object. Forexample, the link L1 in FIG. 8 may include an identifier for D2 storedin the alias object D5 that includes a unique system wide name, such as“A1.D2”.

The alias component 904 may also grant a role in the target account (theaccount with which the sharer account has shared data or resources)access rights to the alias object (see, e.g., G1 of FIG. 8 ).Additionally, the alias component 904 may also grant the role in thetarget account to a share role in the share object of the sharer account(see, e.g., G2 of FIG. 8 ). With the alias object created, a linkbetween the alias object and an object in the sharer account, and grantsto a role in the target account, the target account may be free to runqueries, statements, or “see” shared data or resources in the shareraccount.

The request component 906 is configured to receive a request from anaccount to access a shared resource in a different account. The requestmay include a database query, select statement, or the like to access aresource. In one embodiment, the request includes a request directed toan alias object of the requesting account. The request component 906 mayidentify a resource with which the alias object is linked, such as adatabase or table in a sharer account. The request component 906 mayidentify the linked object based on an identifier of the alias object.

The request component 906 may further be configured to receive a requestfrom an account to count common datapoints between two accounts. Therequest component 906 may be associated with a first account and mayreceive a request from a second account to generate a secure joinbetween the two accounts and determine how many, and which, datapointsare shared between the two accounts. The datapoints may be of a singlesubject matter or column identifier or may be of multiple subjectmatters.

The access component 908 is configured to determine whether an accounthas access to a shared resource of a different account. For example, ifa first account requests access to a resource of a different, secondaccount, the access component 908 may determine whether the secondaccount has granted access to the first account. The access component908 may determine whether a requesting account has access by determiningwhether a share object identifies the requesting account. For example,the access component 908 may check if the requesting account is presentin a list of accounts stored by a share object. The access component 908may also check whether the share object that identifies the requestingaccount has access rights (e.g., grants) to the targeted data resourcein the sharer account.

In one embodiment, the access component 908 may check for the presenceof a grant from a share role in a sharer account to a requesting role inthe requesting account. The access component 908 may check whether alink exists between an alias object to which a database request orstatement was directed or whether a grant exists between a requestingrole and the alias object. For example, the access component 908 maycheck for the existence or presence of one or more of L1, G1 and G2illustrated in FIG. 8 . Furthermore, the access component 908 may checkfor a grant between a role in a share object to an object (such as atable or database) of the sharer account. For example, the accesscomponent 908 may check for the existence of a grant between the role R4and the database D2 in FIG. 8 . If the access component 908 determinesthat the requesting account has access to the shared resource, therequest may be fulfilled by the share component 210 or a processingcomponent 910. If the access component 908 determines that therequesting account does not have rights to the requested data or object,the request will be denied.

The processing component 910 is configured to process database requests,queries, or statements. The processing component 910 may process andprovide a response to a request from an account to access or use data orservices in another account. In one embodiment, the processing component910 provides a response to a request by processing the request usingoriginal data in a sharer account that is different from the requestingaccount. For example, the request may be directed toward a database ortable stored in or for a first account and the processing component 910may process the request using the database or table of the first accountand return a response to a requesting, second account.

In one embodiment, the processing component 910 performs processing ofshared data without creating a duplicate table or other data source inthe requesting account. Generally, data must be first ingested into anaccount that wishes to process that data or perform operations againstthe data. The processing component 910 may save processing time, delay,and/or memory resources by allowing a target account to access sharedresources in a sharer account without creating a copy of a data resourcein the target account.

The processing component 910 may perform processing of the same datausing different processing resources for different accounts. Forexample, a first virtual warehouse for a first account may process adatabase query or statement using data shared by a sharer account and asecond virtual warehouse for a second account, or the sharer account,may process a database query or statement using the shared data of thesharer account. Using separate processing resources to process the samedata may prevent contention for processing resources between accounts.The processing resources may include dynamically provisioned processingresources. In one embodiment, processing of shared data is performedusing a virtual warehouse for the requesting account even though thedata may be in storage for a different account.

The secure view component 912 is configured to define a secure view fora share object, a data field of a share object, a data field of adatabase object, and so forth. In an embodiment, the secure viewcomponent 912 defines the secure view using a SECURE keyword in a viewfield and may set or unset the SECURE property on a view using an ALTERVIEW command. In various embodiments, the secure view component 912 mayimplement such commands only at the manual direction of a client accountor may be configured to automatically implement such commands. Thesecure view component 912 may alter the parser to support the securekeyword before the view name and the new alter view rule. In anembodiment, the alter view rule may be more general to incorporatefurther view-level attributes. In terms of metadata support, the viesmay effectively be stored as tables, and the change may involve alteringa table data persistence object that includes a secure flag indicatingwhether the view is a secure view (this may be implemented in additionto the view text comprising the secure tag). The secure user-definedfunction definition (i.e. the table data persistence object) may behidden from users that are not the owner of the view. In such anembodiment, a command to show views will return results as usual to theowner of the view but will not return the secure user-defined functiondefinition to a non-owner second account that has access to the view.

The secure view component 912 may alter transformations of a parse tree,e.g. view merging and predicate information. The canonicalimplementation may include annotating query blocks such that the queryblocks are designated as coming from secure view. In such animplementation, the query blocks cannot be combined with external queryblocks (e.g. view merging) or expressions (e.g. via filter pushdown).

The secure view component 912 may rewrite the query plan tree duringoptimization e.g. during filter pullup and/or filter pushdown. Thesecure view component 912 may be configured to ensure that no expressionthat does not stem from a secure view can be pushed down below the viewboundaries. The secure view component 912 may be configured to achievethis by implementing a new type of projection that behaves identicallyto a standard projection but, since it is not a standard projection,fails to match any of the rewrite rule preconditions. As a result, therelevant rewrites are not applied. The secure view component 912 may beconfigured to identify what type of projection is to be generated (e.g.a standard projection or a secure projection) after query blocks havebeen designated as coming from a secure user-defined function definitionor not.

The secure view component 912 is configured to optimize performance forsecure views in a zero-copy data sharing system. In various embodimentsknown in the art, secure views are known to cause a loss of performancethat may effectively cripple the optimizer from applying certaintransformations. Such embodiments might be improved by deeming certaintransformations as safe, where safety indicates that the operationsbeing transformed will not have any side effects on the system. Suchside effects may be caused by a user defined function (UDF) thatperforms operations that cannot readily identify unsafe operations, oroperations that can fail and reveal information about the data valuethat caused the failure (e.g. when dividing by zero or some similaroperation). The secure view component 912 may annotate expressions withthe expression's safety properties and then enable transformations thatallow an expression to be pushed through a secure view boundary if theexpression is deemed safe. The expression may be deemed safe if theexpression is known to produce no errors and the expression does notcontain a user defined function (UDF). The secure view component 912 maydetermine whether the expression produces errors by utilizing anexpression properties framework where the expression properties store anindication whether an expression may produce errors.

FIG. 10 is a schematic diagram of a system 1000 for generating a securejoin between two accounts. The secure join takes place between aprovider 1002 and a receiver 1006. It should be appreciated that theterms “provider” and “receiver” are illustrative only and mayalternatively be referred to as a first account and a second account, asa sharer account and a target account, as a provider and a consumer, andso forth. In an embodiment, the provider 1002 and the receiver 1006 aredifferent accounts associated with the same cloud-based databaseadministrator. In an embodiment, the provider 1002 and the receiver 1006are associated with different cloud-based and/or traditional databasesystems. The provider 1002 includes a provider execution platform 1004having one or more execution nodes capable of executing processing taskson the database data of the provider 1002, wherein the database data isstored in a data store associated with the provider 1002. Similarly, thereceiver 1006 includes a receiver execution platform 1008 having one ormore execution nodes capable of executing processing tasks on thedatabase data of the receiver 1006, wherein the database data is storedon a data store associated with the receiver 1006. The data store mayinclude cloud-based scalable storage such that the database data isspread across a plurality of shared storage devices accessible by anexecution platform such as 1004 or 1008. The secure join between theprovider 1002 and the receiver 1006 is implemented with a secureuser-defined function 1010 (“secure UDF”). The secure UDF 1010 includesprogramming logic and/or a hash key for generating a secure join key1012. The secure UDF 1010 may be generated by the provider 1002, thereceiver 1006, a database administration system, and/or a third party.The secure UDF 1010 may be run on the provider execution platform 1004and/or the receiver execution platform 1008. In an embodiment, thesecure UDF 1010 is only run on the receiver execution platform 1008 andthe provider 1002 has no visibility into whether or when the secure UDF1010 was run by the receiver 1006.

In an embodiment, the secure UDF 1010 is defined by the provider 1002and made available to the receiver 1006. The secure UDF 1010 isconfigured to generate the secure join key 1012 for joining dataassociated with the provider 1002 with data associated with the receiver1006. The secure join key 1012 is a hash string. In an embodiment, thesecure join key 1012 is a one-way hash that cannot be read by either ofthe provider 1002 or the receiver 1006 to discern the original data. Thesecure join key 1012 includes data from the provider 1002 and data fromthe receiver 1006. In an embodiment, the data from the provider 1002and/or the data from the receiver 1006 is salted before running thesecure UDF 1010 to generate the secure join key 1012. In such anembodiment, the salted value is hashed by the secure UDF 1010 and cannotbe read by the other party. In an embodiment, a database administratorimplements the secure UDF 1010 to generate the secure join key 1012 andthe database administrator is solely capable of reverting the securejoin key 1012 to the original data. This introduces additional securitybenefits that may be necessary in certain implementations, for examplewhere the data includes sensitive information such as personallyidentifiable information and/or personal health information.

In an embodiment, the original data from the provider 1002 and/or theoriginal data from the receiver 1006 is salted before the secure joinkey is generated. The salt may include any information or charactersthat are only known to the owner of the data, i.e. the provider 1002 orthe receiver 1006. A salted datapoint may include the original datapointwith a salt adhered to it. By way of example, if the original datapointis a customer's name such as “Susan,” the salted datapoint may readsomething like “SusanSALT123” where “SALT123” represents any string ofcharacters (i.e., the salt) that is known only by the owner of the data.In an embodiment, the salt may be known to the owner of the data and mayfurther be known to a database system or provider that is responsiblefor generating the secure join key, implementing the secure UDF, and/orstoring database data.

In an embodiment, the secure UDF 1010 is in communication with and canaccess the data of the provider 1002 and is run on the receiverexecution platform 1008. The secure UDF 1010 includes hashing code andgenerates the secure join key 1012. The secure UDF 1010 performs theon-way hash on the provider's 1002 data to generate the secure join key1012. In an embodiment, an additional secure user-defined function isconfigured to perform analysis at 1014 that is based on the secure joinkey. Example analyses that may be performed based on the secure join key1012 includes a count of overlapping datapoints at 1016, anidentification of overlapping datapoints at 1018, and/or an analysis onshared datapoints at 1020. A number of different analyses may beperformed to determine similarities or differences between theprovider's 1002 data and the receiver's 1006 data. Each of the differentanalyses may be performed by a different secure user-defined functionthat may be programmed by the provider 1002, the receiver 1006, a thirdparty, a system administrator, and so forth.

The system 1000 is configured such that neither of the provider 1002 northe receiver 1006 has any visibility into the original data stored inthe other account. The original data is hashed by the secure UDF 1010using a one-way hash such that the original data is converted into anunidentifiable stream of hashed data and cannot be reverted by theprovider 1002 or the receiver 1006 to determine the original data.

An additional secure user-defined function may perform any of the one ormore analyses based on the secure join key at 1014. In an embodiment, asecure user-defined function performs a count of overlapping datapointsbased on the hashed streams that make up the secure join key 1012. Thesecure UDF 1010 may generate a separate hash stream of data from theprovider 1002 and data from the receiver 1006, and the collective hashstreams may be referred to as the secure join key 1012. The secureuser-defined function that is configured to perform the count ofoverlapping datapoints may perform a row-by-row count or a bulk count todetermine how many and which datapoints are the same between theprovider 1002 and the receiver 1006. This secure user-defined functionmay be configured to return only a yes/no answer for each pair of hasheddatapoints to determine whether the pair of hashed datapoints is thesame and therefore the provider 1002 and the receiver 1006 have amatching original datapoint. This secure user-defined function mayreturn a listing of the identities of all matching datapoints and/or atotal count of how many matching datapoints exist between the provider1002 and the receiver 1006.

FIG. 11 is a schematic diagram of a system 1100 for a reverse sharesecure join between a provider 1102 and a receiver 1106. The reverseshare provides a secure option for the receiver 1106 to generate asecure view 1114 of its data that may be provided to the receiver 1106such that the receiver 1106 may run a secure user-defined function 1110(“secure UDF”) on the provider's 1102 data. The secure UDF 1110 includeshashing code such that it is configured to take data from the provider1102 and/or the receiver 1106 and hash the data to generate a securejoin key 1112. The secure join key 1112 may be used to perform variousanalyses on the provider's 1102 and the receiver's 1106 data, such as todetermine how many and which datapoints are common between the provider1102 and the receiver 1106.

In an embodiment, the secure UDF 1110 is only run on the receiverexecution platform 1108 and is not run on any compute nodes owned oroperated by the provider 1102. In such an embodiment, the provider 1102has no visibility into whether or when the secure UDF 1110 was run bythe receiver. In the system 1100, the secure UDF 1110 may be defined bythe provider 1102, by the receiver 1106, by a third party, or by adatabase system administrator or database provider. The secure UDF 1110generates the secure join key 1112 by hashing data from the provider1102 and the receiver 1106.

In a reverse share as illustrated in FIG. 11 , the data provided to thesecure UDF 1110 may be salted. The salted data may be hashed to generatethe secure join key 1112. This provides additional security benefits sothe other party cannot determine the original data. For example, theprovider 1102 may adhere a salt to each of its datapoints and providethose salted datapoints to the secure view 1114 to be read by the secureUDF 1110. When the secure UDF 1110 hashes the salted data points togenerate the secure join key 1112, each datapoint will have additionalcharacters that have also been hashed. This hash string includes theoriginal data and the salt, and both have been hashed such that theother party (the receiver 1106) cannot determine the provider's 1102actual data.

For example, the provider 1102 and the receiver 1106 may be accountsassociated with retail business. The two business may have stored theircustomer base on a database and may wish to determine how many and whichcustomers the two business have in common. The customer base may includepersonal identifiable information about each customer, such as name,address, telephone number, and so forth. Each business (the provider1102 and the receiver 1106) may have an agreement with their customersto ensure such information remains private. Additionally, the provider1102 and the receiver 1106 may not wish the other party to know whotheir customers are and may only wish to know how many customers theyhave in common. In the example, the provider 1102 may define the secureUDF 1110 with hashing code such that the secure UDF 1110 can generate asecure join key 1112. The secure join key is an unidentifiable hashstring of the provider's 1102 data and the receiver's data 1106. Thesecure UDF 1110 may be run on the receiver execution platform 1108 suchthat the provider 1102 does not have any visibility into when or if thesecure UDF 1110 was run. The provider 1102 may salt its data values tofurther disguise the original data from the receiver. For example, onedatapoint owned by the provider 1102 may be a telephone numberassociated with one of the provider's 1102 customers. The provider 1102may salt the telephone number (e.g. 801-555-5555) by adding a “salt” tothe telephone number. The salted telephone number may have anycharacters added to it to disguise the original datapoint. By way ofexample, the salted telephone number may read SALT1238015555555SALT987.It should be appreciated that the salt added to the datapoints may beany suitable string of characters and may be known only by the saltingparty. The provider 1102 may provide the salted data values to thereceiver 1106 by way of the secure view 1114. The provider's 1102 salteddata values will be hashed to generate the secure join key 1112. Becausethe receiver 1106 does not have any knowledge of the salt, the receiver1106 cannot discern any of the original data values by de-hashing thesecure join key 1112. The receiver 1106 may run the secure UDF 1110 onthe receiver execution platform 1108 to generate the secure join key1112, and the receiver 1106 may further run one or more additionaluser-defined functions to analyze the data that is hashed in the securejoin key 1112. The receiver 1106 may wish to know, for example, how manycustomers the parties have in common, which customers the parties havein common, and so forth.

In an embodiment as illustrated in FIG. 11 , the receiver 1106 salts itsdata and may further hash its data before providing the data to thesecure UDF 1110. The receiver 1106 may provide salted hash values of itsdata to the secure UDF 1110 that was defined by the provider 1102. Thesecure UDF 1110 generates the secure join key 1112 based on hashing codeand on the salted hash values received from the receiver 1106. Thesecure join key 1112 may be provided to the provider 1102 by way of thesecure view 1114.

In an embodiment, the receiver 1106 may include additional salted hashvalues that do not represent any real receiver datapoints. This mayprevent the provider 1102 from deducing anything about the size of thereceiver's 1106 actual datastore by inspecting the contents of thereverse share. This technique may be referred to as “noise injection”and does not harm analytics performed on the data such as the count ofcommon datapoints between the provider 1102 and the receiver 1106.

FIG. 12 is a schematic block diagram of an exemplary embodiment of workdistribution 1200 for a secure join between a provider 1202 and areceiver 1206. The work distribution 1200 is split amongst the provider1202, the secure user-defined function (UDF) 1204, and the receiver1206. In an embodiment, the secure UDF 1204 is run on an executionplatform belonging to the receiver 1206. The secure UDF 1204 may bedefined by the provider 1202 as shown in FIG. 12 or it may be defined bythe receiver 1206, a third party, a database provider, a databaseadministrator, and so forth. When the secure UDF 1204 is run on anexecution platform associated with the receiver, the provider may beprohibited from knowing whether or when the secure UDF 1204 was run.

The provider 1202 defines at 1208 the secure UDF and generates at 1210 asecure join key. The provider 1202 shares at 1212 the secure UDF withthe receiver. The receiver 1206 receives at 1214 the share of the secureUDF from the provider 1202. The secure UDF 1204 hashes at 1216 theprovider's data using the secure join key. The secure UDF 1204 hashes at1218 the receiver's data with the secure join key. The secure UDF 1204performs at 1220 a secure join overlap count between the hashedprovider's data and the hashed receiver's data. The secure UDF 1204provides at 1222 the secure join overlap count to the receiver.

FIG. 13 is a schematic block diagram of an exemplary embodiment of workdistribution 1300 for a reverse share secure join between a provider1302 and a receiver 1306. The work distribution 1300 is split amongstthe provider 1302, the receiver 1306, and a secure user-defined function(UDF) 1304. The secure UDF 1304 is run on an execution platformbelonging to the receiver 1306. When the secure UDF 1304 is run on anexecution platform associated with the receiver 1306, the provider 1302may be prohibited from knowing whether or when the secure UDF 1304 wasrun.

The provider 1302 defines at 1308 a secure UDF definition. Defining thesecure UDF definition includes generating at 1310 a secure join key. Theprovider 1302 shares at 1312 the secure UDF including the secure joinkey generator with the receiver 1306. The receiver 1306 receives at 1314the share of the secure UDF. The receiver 1306 generates at 1316 areceiver key that may include a hash key. The receiver 1306 generates at1318 salted datapoints using the receiver key. The receiver 1306 sharesat 1320 the salted datapoints with the secure UDF. The secure UDF 1304hashes at 1322 the provider's data using the secure join key. The secureUDF 1304 receives at 1324 the receiver key and decrypts at 1326 thesalted datapoints with the receiver key. The secure UDF 1304 hashes at1328 the receiver's data using the secure join key. The secure UDF 1304performs at 1330 a secure join overlap count of the hashed provider'sdata and the hashed receiver's data. The secure UDF 1304 provides at1332 the secure join overlap count to the receiver. The secure UDF 1304may further provide the secure join overlap count to the provider.

FIG. 14 is a schematic flow chart diagram of a method 1400 forgenerating a secure join of database data. The method 1400 may beimplemented in a multi-tenant database system such as that illustratedin FIG. 1 . The method may be performed by any suitable computing devicesuch as a share component 210 as disclosed herein.

The method 1400 begins and the computing device determines at 1402 datastored in a first account to be compared with data stored in a secondaccount. The data stored in the first account and the data stored in thesecond account may be directed to the same subject matter. In anexemplary implementation, the first account and the second account maybe associated with the same multi-tenant database system and mayrepresent different client accounts within that multi-tenant databasesystem. The first account and the second account may wish to determineall common datapoints of a certain subject matter between the twoaccounts. For example, the first account and the second account may beassociated with companies wishing to determine all customers that thetwo companies have in common. The companies may be storing customer datawith certain personally identifiable information such as, for example, atelephone number for each customer, a social security number for eachcustomer, an address for each customer, and so forth. Such personallyidentifiable information may be sensitive and should not be shared withany entity or person outside of the company to which the customerprovided the information. In this implementation, the two companies maywish to securely compare all datapoints of a certain type, such as allcustomer phone numbers, all customer social security numbers, allcustomer addresses, and so forth. The companies may wish to compare suchdatapoints without permitting the other company to view the personallyidentifiable information for any customers and/or a total number ofcustomers. The method 1400 continues and the computing device determinesat 1404 a function for generating a secure join key, wherein the securejoin key comprises a hashed string that hashes one or more of a dataentry of the first account or a data entry of the second account. Thefunction may be defined by either of the first account or the secondaccount, a third party, or a database administrator or provider. Themethod 1400 continues and the computing devices provides at 1406 thesecure join key to the first account and/or the second account.

FIG. 15 is a schematic flow chart diagram of a method 1500 forgenerating a secure join of database data. The method 1500 may beimplemented in a multi-tenant database system such as that illustratedin FIG. 1 . The method may be performed by any suitable computing devicesuch as a share component 210 as disclosed herein.

The method 1500 begins and the computing device determines at 1502 datastored in a first account to be compared with data stored in a secondaccount. The computing device determines at 1504 a function forgenerating a secure join key, wherein the secure join key comprises ahashed string that hashes one or more of a data entry of the firstaccount or a data entry of the second account. The computing deviceprovides at 1506 the secure join key to the first account and/or thesecond account. The computing device compares at 1508 the data stored inthe first account with the data stored in the second account based onthe secure join key by identifying common data entries between the firstaccount and the second account. The computing device enriches at 1510the data stored in the first account and/or the data stored in thesecond account based on common data entries between the first accountand the second account.

FIG. 16 is a block diagram depicting an example computing device 1600.In some embodiments, computing device 1600 is used to implement one ormore of the systems and components discussed herein. Further, computingdevice 1600 may interact with any of the systems and componentsdescribed herein. Accordingly, computing device 1600 may be used toperform various procedures and tasks, such as those discussed herein.Computing device 1600 can function as a server, a client or any othercomputing entity. Computing device 1600 can be any of a wide variety ofcomputing devices, such as a desktop computer, a notebook computer, aserver computer, a handheld computer, a tablet, and the like.

Computing device 1600 includes one or more processor(s) 1602, one ormore memory device(s) 1604, one or more interface(s) 1606, one or moremass storage device(s) 1608, and one or more Input/Output (I/O)device(s) 1610, all of which are coupled to a bus 1612. Processor(s)1602 include one or more processors or controllers that executeinstructions stored in memory device(s) 1604 and/or mass storagedevice(s) 1608. Processor(s) 1602 may also include various types ofcomputer-readable media, such as cache memory.

Memory device(s) 1604 include various computer-readable media, such asvolatile memory (e.g., random access memory (RAM)) and/or nonvolatilememory (e.g., read-only memory (ROM)). Memory device(s) 1604 may alsoinclude rewritable ROM, such as Flash memory.

Mass storage device(s) 1608 include various computer readable media,such as magnetic tapes, magnetic disks, optical disks, solid statememory (e.g., Flash memory), and so forth. Various drives may also beincluded in mass storage device(s) 1608 to enable reading from and/orwriting to the various computer readable media. Mass storage device(s)1608 include removable media and/or non-removable media.

I/O device(s) 1610 include various devices that allow data and/or otherinformation to be input to or retrieved from computing device 1600.Example I/O device(s) 1610 include cursor control devices, keyboards,keypads, microphones, monitors or other display devices, speakers,printers, network interface cards, modems, lenses, CCDs or other imagecapture devices, and the like.

Interface(s) 1606 include various interfaces that allow computing device1600 to interact with other systems, devices, or computing environments.Example interface(s) 1606 include any number of different networkinterfaces, such as interfaces to local area networks (LANs), wide areanetworks (WANs), wireless networks, and the Internet.

Bus 1612 allows processor(s) 1602, memory device(s) 1604, interface(s)1606, mass storage device(s) 1608, and I/O device(s) 1610 to communicatewith one another, as well as other devices or components coupled to bus1612. Bus 1612 represents one or more of several types of busstructures, such as a system bus, PCI bus, IEEE 1394 bus, USB bus, andso forth.

For purposes of illustration, programs and other executable programcomponents are shown herein as discrete blocks, although it isunderstood that such programs and components may reside at various timesin different storage components of computing device 1600 and areexecuted by processor(s) 1602. Alternatively, the systems and proceduresdescribed herein can be implemented in hardware, or a combination ofhardware, software, and/or firmware. For example, one or moreapplication specific integrated circuits (ASICs) can be programmed tocarry out one or more of the systems and procedures described herein. Asused herein, the terms “module” or “component” are intended to conveythe implementation apparatus for accomplishing a process, such as byhardware, or a combination of hardware, software, and/or firmware, forthe purposes of performing all or parts of operations disclosed herein.

EXAMPLES

The following examples pertain to further embodiments.

Example 1 is a system for performing a secure join of database data, thesystem comprising. The system includes means for determining data storedin a first account to be compared with data stored in a second account.The system includes means for determining a function for generating asecure join key, wherein the secure join key comprises a hash stringthat hashes one or more of: a data entry of the first account; a firstsalt string associated with the first account; a first accountidentification number associated with the first account; a data entry ofthe second account; a second salt string associated with the secondaccount; or a second account identification number associated with thesecond account; and means for providing the secure join key to the firstaccount and/or the second account.

Example 2 is a system as in Example 1, wherein the secure join keycomprises a double salted hash value and the function is configured togenerate the secure join key based on: the data entry of the firstaccount with the first salt string associated with the first account;and the data entry of the second account with the second salt stringassociated with the second account; wherein original non-hashed ornon-salted data cannot be identified by the first account or the secondaccount based on the secure join key.

Example 3 is a system as in any of Examples 1-2, further comprisingmeans for comparing the data stored in the first account with the datastored in the second account based on the secure join key, wherein themeans for comparing is configured to one or more of: identify commondata entries between the first account and the second account; determinea number of common data entries between the first account and the secondaccount; or enrich the data stored in the first account and/or the datastored in the second account based on common data entries between thefirst account and the second account.

Example 4 is a system as in any of Examples 1-3, wherein the functiongenerates the secure join key further based on a salted data entryreceived from the second account that does not refer to any real dataentry in the second account and is included to insert noise into thesecure join key.

Example 5 is a system as in any of Examples 1-4, further comprisingmeans for determining a comparison function that comprises procedurallogic having instructions for comparing the data stored in the firstaccount with the data stored in the second account, the instructionscomprising: comparing a first hashed datapoint from the first accountwith a second hashed datapoint from the second account, wherein thefirst hashed datapoint and the second hashed datapoint are generated bythe function; determining whether the first hashed datapoint is the sameas the second hashed datapoint; and in response to determining that thefirst hashed datapoint and the second hashed datapoint are the same,generating an indication that the first account and the second accounthave a common datapoint.

Example 6 is a system as in any of Examples 1-5, wherein theinstructions further comprise, in response to determining that the firsthashed datapoint and the second hashed datapoint are the same: revertingthe first hashed datapoint to a first original datapoint; reverting thesecond hashed datapoint to a second original datapoint; and enrichingthe data stored in the first account and/or the data stored in thesecond account with the first original datapoint and/or the secondoriginal datapoint.

Example 7 is a system as in any of Examples 1-6, wherein the functionfor generating the secure join key is defined by the first account andis executed on an execution platform associated with the second account.

Example 8 is a system as in any of Examples 1-7, wherein the firstaccount does not have visibility into when or if the function forgenerating the secure join key was executed by the second account.

Example 9 is a system as in any of Examples 1-8, wherein theinstructions of the procedural logic further comprise: determining anumber of common data entries between the first account and the secondaccount; determining whether the number of common data entries meets athreshold number defined by an account; in response to determining thenumber of common data entries meets the threshold number, returningresults of the comparison function to the account; and in response todetermining the number of common data entries does not meet thethreshold number, withholding a notification of the results of thecomparison function from being delivered to the account.

Example 10 is a system as in any of Examples 1-9, further comprisingmeans for jittering results of the function for generating the securejoin key by introducing a random factor that does not comprise a dataentry found in either of the first account or the second account.

Example 11 is a method for performing a secure join on database data.The method includes determining data stored in a first account to becompared with data stored in a second account. The method includesdetermining a function for generating a secure join key, wherein thesecure join key comprises a hashed string that hashes one or more of: adata entry of the first account; a first salt string associated with thefirst account; a first account identification number associated with thefirst account; a data entry of the second account; a second salt stringassociated with the second account; or a second account identificationnumber associated with the second account. The method includes providingthe secure join key to the first account and/or the second account.

Example 12 is a method as in Example 11, wherein the secure join keycomprises a double salted hash value and the function is configured togenerate the secure join key based on: the data entry of the firstaccount with the first salt string associated with the first account;and the data entry of the second account with the second salt stringassociated with the second account; wherein original non-hashed ornon-salted data cannot be identified by the first account or the secondaccount based on the secure join key.

Example 13 is a method as in any of Examples 11-12, further comprisingcomparing the data stored in the first account with the data stored inthe second account based on the secure join key, wherein the comparingcomprises: identifying common data entries between the first account andthe second account; determining a number of common data entries betweenthe first account and the second account; or enriching the data storedin the first account and/or the data stored in the second account basedon common data entries between the first account and the second account.

Example 14 is a method as in any of Examples 11-13, wherein the functiongenerates the secure join key further based on a salted data entryreceived from the second account that does not refer to any real dataentry in the second account and is included to insert noise into thesecure join key.

Example 15 is a method as in any of Examples 11-14, further comprisingdetermining a comparison function that comprises procedural logic havinginstructions for comparing the data stored in the first account with thedata stored in the second account, the instructions of the procedurallogic comprising: comparing a first hashed datapoint from the firstaccount with a second hashed datapoint from the second account, whereinthe first hashed datapoint and the second hashed datapoint are generatedby the function; determining whether the first hashed datapoint is thesame as the second hashed datapoint; and in response to determining thatthe first hashed datapoint and the second hashed datapoint are the same,generating an indication that the first account and the second accounthave a common datapoint.

Example 16 is a method as in any of Examples 11-15, wherein theinstructions of the procedural logic further comprise, in response todetermining that the first hashed datapoint and the second hasheddatapoint are the same: reverting the first hashed datapoint to a firstoriginal datapoint; reverting the second hashed datapoint to a secondoriginal datapoint; and enriching the data stored in the first accountand/or the data stored in the second account with the first originaldatapoint and/or the second original datapoint.

Example 17 is a method as in any of Examples 11-16, wherein the functionfor generating the secure join key is defined by the first account andis executed on an execution platform associated with the second account.

Example 18 is a method as in any of Examples 11-17, wherein the firstaccount does not have visibility into when or if the function forgenerating the secure join key was executed by the second account.

Example 19 is a method as in any of Examples 11-18, wherein theinstructions of the procedural logic further comprise: determining anumber of common data entries between the first account and the secondaccount; determining whether the number of common data entries meets athreshold number defined by an account; in response to determining thenumber of common data entries meets the threshold number, returningresults of the comparison function to the account; and in response todetermining the number of common data entries does not meet thethreshold number, withholding a notification of the results of thecomparison function from being delivered to the account.

Example 20 is a method as in any of Examples 11-19, further comprisingjittering results of the function for generating the secure join key byintroducing a random factor that does not comprise a data entry found ineither of the first account or the second account.

Example 21 is a processor that is programmable to execute instructionsstored in non-transitory computer readable storage media. Theinstructions include determining data stored in a first account to becompared with data stored in a second account. The instructions includedetermining a function for generating a secure join key, wherein thesecure join key comprises a hashed string that hashes one or more of: adata entry of the first account; a first salt string associated with thefirst account; a first account identification number associated with thefirst account; a data entry of the second account; a second salt stringassociated with the second account; or a second account identificationnumber associated with the second account. The method includes providingthe secure join key to the first account and/or the second account.

Example 22 is a processor as in Example 21, wherein the secure join keycomprises a double salted hash value and the function is configured togenerate the secure join key based on: the data entry of the firstaccount with the first salt string associated with the first account;and the data entry of the second account with the second salt stringassociated with the second account;

wherein original non-hashed or non-salted data cannot be identified bythe first account or the second account based on the secure join key.

Example 23 is a processor as in any of Examples 21-22, wherein theinstructions further comprise comparing the data stored in the firstaccount with the data stored in the second account based on the securejoin key, wherein the comparing comprises: identifying common dataentries between the first account and the second account; determining anumber of common data entries between the first account and the secondaccount; or enriching the data stored in the first account and/or thedata stored in the second account based on common data entries betweenthe first account and the second account.

Example 24 is a processor as in any of Examples 21-23, wherein thefunction generates the secure join key further based on a salted dataentry received from the second account that does not refer to any realdata entry in the second account and is included to insert noise intothe secure join key.

Example 25 is a processor as in any of Examples 21-24, wherein theinstructions further comprise determining a comparison function thatcomprises procedural logic having instructions for comparing the datastored in the first account with the data stored in the second account,the instructions of the procedural logic comprising: comparing a firsthashed datapoint from the first account with a second hashed datapointfrom the second account, wherein the first hashed datapoint and thesecond hashed datapoint are generated by the function; determiningwhether the first hashed datapoint is the same as the second hasheddatapoint; and in response to determining that the first hasheddatapoint and the second hashed datapoint are the same, generating anindication that the first account and the second account have a commondatapoint.

Example 26 is an apparatus including means to perform a method orrealize an apparatus or system as in any of Examples 1-25.

Example 27 is a machine-readable storage including machine-readableinstructions, when executed, to implement a method or realize anapparatus of any of Examples 1-25.

Various techniques, or certain aspects or portions thereof, may take theform of program code (i.e., instructions) embodied in tangible media,such as floppy diskettes, CD-ROMs, hard drives, a non-transitorycomputer readable storage medium, or any other machine-readable storagemedium wherein, when the program code is loaded into and executed by amachine, such as a computer, the machine becomes an apparatus forpracticing the various techniques. In the case of program code executionon programmable computers, the computing device may include a processor,a storage medium readable by the processor (including volatile andnon-volatile memory and/or storage elements), at least one input device,and at least one output device. The volatile and non-volatile memoryand/or storage elements may be a RAM, an EPROM, a flash drive, anoptical drive, a magnetic hard drive, or another medium for storingelectronic data. One or more programs that may implement or utilize thevarious techniques described herein may use an application programminginterface (API), reusable controls, and the like. Such programs may beimplemented in a high-level procedural or an object-oriented programminglanguage to communicate with a computer system. However, the program(s)may be implemented in assembly or machine language, if desired. In anycase, the language may be a compiled or interpreted language, andcombined with hardware implementations.

It should be understood that many of the functional units described inthis specification may be implemented as one or more components, whichis a term used to more particularly emphasize their implementationindependence. For example, a component may be implemented as a hardwarecircuit comprising custom very large-scale integration (VLSI) circuitsor gate arrays, off-the-shelf semiconductors such as logic chips,transistors, or other discrete components. A component may also beimplemented in programmable hardware devices such as field programmablegate arrays, programmable array logic, programmable logic devices, orthe like.

Components may also be implemented in software for execution by varioustypes of processors. An identified component of executable code may, forinstance, comprise one or more physical or logical blocks of computerinstructions, which may, for instance, be organized as an object, aprocedure, or a function. Nevertheless, the executables of an identifiedcomponent need not be physically located together but may comprisedisparate instructions stored in different locations that, when joinedlogically together, comprise the component and achieve the statedpurpose for the component.

Indeed, a component of executable code may be a single instruction, ormany instructions, and may even be distributed over several differentcode segments, among different programs, and across several memorydevices. Similarly, operational data may be identified and illustratedherein within components and may be embodied in any suitable form andorganized within any suitable type of data structure. The operationaldata may be collected as a single data set or may be distributed overdifferent locations including over different storage devices, and mayexist, at least partially, merely as electronic signals on a system ornetwork. The components may be passive or active, including agentsoperable to perform desired functions.

Reference throughout this specification to “an example” means that aparticular feature, structure, or characteristic described in connectionwith the example is included in at least one embodiment of the presentdisclosure. Thus, appearances of the phrase “in an example” in variousplaces throughout this specification are not necessarily all referringto the same embodiment.

As used herein, a plurality of items, structural elements, compositionalelements, and/or materials may be presented in a common list forconvenience. However, these lists should be construed as though eachmember of the list is individually identified as a separate and uniquemember. Thus, no individual member of such list should be construed as ade facto equivalent of any other member of the same list solely based onits presentation in a common group without indications to the contrary.In addition, various embodiments and examples of the present disclosuremay be referred to herein along with alternatives for the variouscomponents thereof. It is understood that such embodiments, examples,and alternatives are not to be construed as de facto equivalents of oneanother but are to be considered as separate and autonomousrepresentations of the present disclosure.

Although the foregoing has been described in some detail for purposes ofclarity, it will be apparent that certain changes and modifications maybe made without departing from the principles thereof. It should benoted that there are many alternative ways of implementing both theprocesses and apparatuses described herein. Accordingly, the presentembodiments are to be considered illustrative and not restrictive.

Those having skill in the art will appreciate that many changes may bemade to the details of the above-described embodiments without departingfrom the underlying principles of the disclosure. The scope of thepresent disclosure should, therefore, be determined only by thefollowing claims.

What is claimed is:
 1. A method comprising: generating a secure userdefined function (UDF) comprising a one-way hash; converting, by thesecure UDF, datapoints of a first account and datapoints of a secondaccount into a secure join key, wherein the secure join key isunidentifiable to the first account and the second account based on theone-way hash; and determining, by a processor, a count value ofoverlapping datapoints between the first account and the second accountbased on the secure join key.
 2. The method of claim 1, wherein, duringthe generating of the secure UDF, the first account inserts an accountidentifier into the one-way hash that identifies the second account. 3.The method of claim 2, wherein the account identifier that identifiesthe second account prohibits the secure UDF to measure overlappingdatapoints associated with a third account.
 4. The method of claim 2,further comprising: receiving the secure UDF at the second account fromthe first account; and generating the secure join key at the secondaccount, wherein the second account is devoid of visibility into thedatapoints of the first account.
 5. The method of claim 1, wherein thesecure join key is a stream of hashed data comprising hashed datapointsof the first account and hashed datapoints of the second account, themethod further comprising: salting the hashed datapoints of the secondaccount, wherein the salting of the datapoints of the second accountcomprises: including, with the salted hashed datapoints, one or moreadditional salted hashed datapoints that do not represent realdatapoints of the consumer account.
 6. The method of claim 5, whereinthe secure join key is based at least in part on the salted hasheddatapoints that do not represent the real datapoints of the consumeraccount.
 7. The method of claim 1, wherein the secure UDF comprises anSQL script configured to count the overlapping datapoints between thefirst account and the second account.
 8. A system comprising: a memory;and a processor operatively coupled to the memory, the processor to:generate a secure user defined function (UDF) comprising a one-way hash;convert, by the secure UDF, datapoints of a first account and datapointsof a second account into a secure join key, wherein the secure join keyis unidentifiable to the first account and the second account based onthe one-way hash; and determine a count value of overlapping datapointsbetween the first account and the second account based on the securejoin key.
 9. The system of claim 8, wherein the first account inserts anaccount identifier into the one-way hash that identifies the secondaccount.
 10. The system of claim 9, wherein the account identifier thatidentifies the second account prohibits the secure UDF to measureoverlapping datapoints associated with a third account.
 11. The systemof claim 9, wherein the processor is further to: receive the secure UDFat the second account from the first account; and generate the securejoin key at the second account, wherein the second account is devoid ofvisibility into the datapoints of the first account.
 12. The system ofclaim 11, wherein the secure join key is a stream of hashed datacomprising hashed datapoints of the first account and hashed datapointsof the second account, and wherein the processor is further to: salt thehashed datapoints of the second account and include one or moreadditional salted hashed datapoints, with the salted hashed datapoints,that do not represent real datapoints of the consumer account.
 13. Thesystem of claim 12, wherein the secure join key is based at least inpart on the salted hashed datapoints that do not represent the realdatapoints of the consumer account.
 14. The system of claim 8, whereinthe secure UDF comprises an SQL script configured to count theoverlapping datapoints between the first account and the second account.15. A non-transitory computer-readable medium having instructions storedthereon which, when executed by a processor, cause the processor to:generate a secure user defined function (UDF) comprising a one-way hash;convert, by the secure UDF, datapoints of a first account and datapointsof a second account into a secure join key, wherein the secure join keyis unidentifiable to the first account and the second account based onthe one-way hash; and determine a count value of overlapping datapointsbetween the first account and the second account based on the securejoin key.
 16. The non-transitory computer-readable medium of claim 15,wherein the first account inserts an account identifier into the one-wayhash that identifies the second account.
 17. The non-transitorycomputer-readable medium of claim 16, wherein the account identifierthat identifies the second account prohibits the secure UDF to measureoverlapping datapoints associated with a third account.
 18. Thenon-transitory computer-readable medium of claim 16, wherein theprocessor is further to: receive the secure UDF at the second accountfrom the first account; and generate the secure join key at the secondaccount, wherein the second account is devoid of visibility into thedatapoints of the first account.
 19. The non-transitorycomputer-readable medium of claim 18, wherein the secure join key is astream of hashed data comprising hashed datapoints of the first accountand hashed datapoints of the second account, and wherein the processoris further to: salt the hashed datapoints of the second account andinclude one or more additional salted hashed datapoints, with the saltedhashed datapoints, that do not represent real datapoints of the consumeraccount.
 20. The non-transitory computer-readable medium of claim 19,wherein the secure join key is based at least in part on the saltedhashed datapoints that do not represent the real datapoints of theconsumer account.